Cache oblivious storage and access heuristics for blocked matrix-matrix multiplication

نویسندگان

  • Nicolas Bock
  • Emanuel H. Rubensson
  • Pawel Salek
  • Anders M. N. Niklasson
  • Matt Challacombe
چکیده

We investigate effects of ordering in blocked matrix–matrix multiplication. We find that submatrices do not have to be stored contiguously in memory to achieve near optimal performance. Instead it is the choice of execution order of the submatrix multiplications that leads to a speedup of up to four times for small block sizes. This is in contrast to results for single matrix elements showing that contiguous memory allocation quickly becomes irrelevant as the blocksize increases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cache oblivious matrix multiplication using an element ordering based on the Peano curve

One of the keys to tap the full performance potential of current hardware is the optimal utilisation of cache memory. Cache oblivious algorithms are designed to inherently benefit from any underlying hierarchy of caches, but do not need to know about the exact structure of the cache. In this paper, we present a cache oblivious algorithm for matrix multiplication. The algorithm uses a block recu...

متن کامل

Communication - Minimizing Algorithms for Matrix Multiplication

As computers increase in speed, the proportion of time spent on communication between cache and hard drive or between multiple processors continues to rise. For single processors, data must be moved between the processor’s fast-access cache and main memory, an operation that often takes many orders of magnitude longer than any arithmetic operation. When multiple levels of cache are present, a c...

متن کامل

A cache-oblivious sparse matrix–vector multiplication scheme based on the Hilbert curve

The sparse matrix–vector (SpMV) multiplication is an important kernel in many applications. When the sparse matrix used is unstructured, however, standard SpMV multiplication implementations typically are inefficient in terms of cache usage, sometimes working at only a fraction of peak performance. Cache-aware algorithms take information on specifics of the cache architecture as a parameter to ...

متن کامل

Two-dimensional cache-oblivious sparse matrix-vector multiplication

In earlier work, we presented a one-dimensional cache-oblivious sparse matrix–vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. ...

متن کامل

Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods

In this article, we introduce a cache-oblivious method for sparse matrix vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behaviour during sparse matrix vector multiplication. Matrices are assumed to be stored in row-major format, by means ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/0808.1108  شماره 

صفحات  -

تاریخ انتشار 2008